22 research outputs found

    Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

    Full text link
    Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising

    Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning

    Full text link
    Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity level classification and an auxilary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted features such as eGeMaps and linguistic features, and SVM, MLP, and XGBoost classifiers. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 4.79% for classification accuracy. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.09% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect

    Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

    Full text link
    This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.Comment: Accepted to Interspeech 202

    Automatic Pronunciation Assessment of Korean Spoken by L2 Learners Using Best Feature Set Selection

    Get PDF
    This paper proposes a method for automatic pronunciation assessment of Korean spoken by L2 learners by selecting the best feature set from a collection of the most well-known features in the literature. The L2 Korean Speech Corpus is used for assessment modeling, where the native languages of the L2 learners are English, Chinese, Japanese, Russian, and Mongolian. In our system, learners speech is forced-aligned and recognized using a native Korean acoustic model. Based on these results, various features for pronunciation assessment are computed, and divided into four categories such as RATE, SEGMENT, SILENCE, and GOP. Pronunciation scores produced by combining categories of features by multiple linear regression are used as a baseline. In order to enhance the baseline performance, relevant features are selected by using Principal Component Regression (PCR) and Best Subset Selection (BSS), respectively. The results show that the BSS model outperforms the baseline and the PCR model, and that features corresponding to speech segment and rate are selected as the relevant ones for automatic pronunciation assessment. The observed tendency of salient features will be useful for further improvement of automatic pronunciation assessment model for Korean language learners.OAIID:RECH_ACHV_DSTSH_NO:A201625650RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_09 (APSIPA λ₯˜ν˜μˆ˜).pdfDEPT_NM:μ–Έμ–΄ν•™κ³ΌEMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/9614f371-16ac-45af-add0-9434be5bacf0/linkCONFIRM:

    Analysis on Difference between Speaking Rates of Phoneme Classes and Oral Proficiency of Korean English Learners

    No full text
    OAIID:RECH_ACHV_DSTSH_NO:A201625644RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_02 (μŒμ„±ν•™νšŒν•™μˆ λŒ€νšŒ λ‚˜λ―Όμˆ˜).pdfDEPT_NM:μ–Έμ–΄ν•™κ³ΌEMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/6d783a10-6a7f-4a4e-9636-1c95c52cf78f/linkCONFIRM:

    쑰음 기반의 μŒμ†Œ 레벨 사후 ν™•λ₯ μ„ μ΄μš©ν•œ ν•œκ΅­μΈ μ˜μ–΄ν•™μŠ΅μž μœ μ°½μ„± μžλ™ 평가

    No full text
    OAIID:RECH_ACHV_DSTSH_NO:A201625645RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_03 (μŒμ„±ν•™νšŒν•™μˆ λŒ€νšŒ λ₯˜ν˜μˆ˜).pdfDEPT_NM:μ–Έμ–΄ν•™κ³ΌEMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/d29fd32e-14b6-4bef-b272-bad73234b9b8/linkCONFIRM:

    ν•œκ΅­μ–΄ CAPT μ‹œμŠ€ν…œμ˜ λΆ„μ ˆμŒ 발음ꡐ윑 μš°μ„ μˆœμœ„: 쀑ꡭ어와 μΌλ³Έμ–΄κΆŒ ν•™μŠ΅μžλ“€μ˜ 변이양상을 μ€‘μ‹¬μœΌλ‘œ

    No full text
    OAIID:RECH_ACHV_DSTSH_NO:A201625648RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_08 (μŒμ„±ν•™νšŒ μ–‘μŠΉν¬).pdfDEPT_NM:μ–Έμ–΄ν•™κ³ΌEMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/7b027b5d-224c-4b3a-bebd-ff4ccf72e266/linkCONFIRM:

    Assistive Program for Automatic Speech Transcription based on G2P conversion and Speech Recognition

    No full text
    OAIID:RECH_ACHV_DSTSH_NO:A201625646RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_04 (μŒμ„±ν•™νšŒ λ‚˜λ―Όμˆ˜).pdfDEPT_NM:μ–Έμ–΄ν•™κ³ΌEMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/2cf9d677-4d60-482f-8fd0-189d8afda57c/linkCONFIRM:

    Optimizing Vocabulary Modeling for Dysarthric Speech Recognition

    No full text
    Imperfection in articulation of dysarthric speech results in the deterioration on the performance of speech recognition. In this paper, the effect of the articulating class of phonemes in the dysarthric speech recognition results is analyzed using generalized linear mixed models (GLMMs). The model with the features categorized according to the manner of articulation and the place of tongue is selected as the best one by the analysis. Recognition accuracy score for each word is predicted based on its pronunciation and the GLMM. The vocabulary optimized by selecting words with the maximum score shows a 16.4 % relative error reduction in dysarthric speech recognition.OAIID:RECH_ACHV_DSTSH_NO:A201625647RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_05 (ICCHP λ‚˜λ―Όμˆ˜).pdfDEPT_NM:μ–Έμ–΄ν•™κ³ΌEMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/5d677af3-75c7-4475-ab97-f1ae6c45ea62/linkCONFIRM:
    corecore